Automatically Identifying Records from the Extracted Data Fields of Genealogical Microfilm
نویسندگان
چکیده
Imagine the ability to search thousands of microfilm documents at once. Yet extracting and organizing this information by hand is nearly impossible. This research proposes an algorithmic process to identify and extract records automatically from tabular tables found in images. The proposed algorithm extracts record patterns by using knowledge of the structure and geometry of tables and a genealogical ontology. The algorithm accepts raw data collected from zoned microfilm images. It receives this data as an XML input file that describes the coordinates of each table cell, the printed text in each cell, if any, and whether or not each table cell is empty. The process produces record patterns that express the geometry and attributes of the records within a table.
منابع مشابه
A System to Automatically Index
Introduction Millions of rolls of microfilm contain valuable genealogical information, yet remain largely inaccessible. A titleboard (see Figure 1) contains semi-structured metadata about the genealogical records that follow the titleboard on the roll of microfilm. This metadata may include the geographical origin of the records (i.e. city and country), the record type (i.e. birth records, marr...
متن کاملRecord Linkage for Genealogical Databases
In this paper we describe past experience and outline current directions in performing record linkage over large genealogical databases. 1. INTRODUCTION AND MOTIVATION Record linkage is the problem of identifying multiple records that refer to the same real-world entity. In genealogical databases, it is the problem of identifying when individuals situated in different pedigrees refer to the sam...
متن کاملCombining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)
Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...
متن کاملAutomatic Wrapper Generation Using Tree Matching and Partial Tree Alignment
This paper is concerned with the problem of structured data extraction from Web pages. The objective of the research is to automatically segment data records in a page, extract data items/fields from these records and store the extracted data in a database. In this paper, we first introduce the extraction problem, and then discuss the main existing approaches and their limitations. After that, ...
متن کاملتحلیل تراکنشهای امانت و گردش منابع کتابخانههای دانشگاه علوم پزشکی بیرجند با الگوریتمهای دادهکاوی
Introduction: Data mining is a process for discovering meaningful relationships and patterns from data. Identify behavior patterns of libraries users can helps improve decision-making in libraries. This study aimed to analyze the interlibrary loan transactions in Birjand University of Medical Sciences using data mining algorithms. Methods: In this descriptive study, knowledge discovery and d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001